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ABSTRACT 

This article describes the patient-centered Scalable 
National Network for Effectiveness Research 
(pSCANNER), which is part of the recently formed 
PCORnet, a national network composed of learning 
healthcare systems and patient-powered research 
networks funded by the Patient Centered Outcomes 
Research Institute (PCORI). It is designed to be a 
stakeholder-governed federated network that uses a 
distributed architecture to integrate data from three 
existing networks covering over 21 million patients in all 
50 states: (1) VA Informatics and Computing 
Infrastructure (VINCI), with data from Veteran Health 
Administration's 151 inpatient and 909 ambulatory care 
and community-based outpatient clinics; (2) the 
University of California Research exchange (UC-ReX) 
network, with data from UC Davis, Irvine, Los Angeles, 
San Francisco, and San Diego; and (3) SCANNER, a 
consortium of UCSD, Tennessee VA, and three federally 
qualified health systems in the Los Angeles area 
supplemented with claims and health information 
exchange data, led by the University of Southern 
California. Initial use cases will focus on three 
conditions: (1) congestive heart failure; (2) Kawasaki 
disease; (3) obesity. Stakeholders, such as patients, 
clinicians, and health service researchers, will be 
engaged to prioritize research questions to be answered 
through the network. We will use a privacy-preserving 
distributed computation model with synchronous and 
asynchronous modes. The distributed system will be 
based on a common data model that allows the 
construction and evaluation of distributed multivariate 
models for a variety of statistical analyses. 



PARTICIPATING HEALTH SYSTEMS 

The patient-centered Scalable National Network 
for Effectiveness Research (pSCANNER) is a con- 
sortium of three existing networks that together 
constitute a highly diverse patient population with 
respect to insurance coverage, socioeconomic 
status, demographics, and health conditions. 
Table 1 summarizes key characteristics of the insti- 
tutions within each existing network. 

VA Informatics and Computing Infrastructure 
(VINCI; http://www.hsrd.research.va.gov/for_ 
researchers/vinci/) is a major informatics initiative 
of the Veterans Health Administration (VHA) that 
provides a secure, central platform for performing 
research and supporting clinical operations activ- 
ities. In addition to national data, VINCI hosts com- 
mercial and custom analytical software for natural 



language processing, annotation, data exploration, 
and epidemiological analysis. The VHA treats 8.76 
million veterans within an integrated healthcare 
delivery system, which includes hospitals, outpatient 
pharmacies, ancillary care facilities, and laboratory 
and radiology services. Since the 1980s, all VA facil- 
ities have used the same electronic health record 
(EHR) system, named Veterans Information System 
Technology Architecture (VistA). On a nightly basis, 
the VA's clinical data warehouse (CDW) is updated 
from VistA. This CDW stores vast amounts of clin- 
ical data dating back to 2000. The CDW and other 
VHA patient-level data remain behind VA firewalls. 
However, within pSCANNER, VINCI patient data 
will be available for consultation through privacy- 
preserving, distributed computing methodology. 
The VA team within pSCANNER will continue to 
develop analytical modules and map its CDW to the 
Observational Medical Outcomes Partnership 
(OMOP) common data model. 1 

The University of California Research exchange 
(UC-ReX; http://www.ucrex.org) was established in 
2010 and has been funded by the University of 
California Office of the President since 2011 
through the UC-Biomedical Research Acceleration, 
Integration, and Development (UC -BRAID) 
Initiative, which streamlines operations within the 
UC system such as institutional review board (IRB) 
activity, biorepository coordination, and other 
research activities. UC-ReX is also supported by the 
National Institutes of Health (NIH) Clinical 
Translational Science Awards from five UC Health 
Systems (UC Davis, Irvine, Los Angeles, 
San Francisco, and San Diego). UC-ReX has EHR 
data for over 12 million patients, and has used the 
i2b2 2 data model for cohort discovery based on a 
limited set of variables related to demographics, diag- 
noses, laboratory tests, and medications. In 
pSCANNER, UC-ReX will map its CDWs to the 
OMOP model. 

Scalable National Network for Effectiveness 
Research (SCANNER; http://scanner.ucsd.edu) was 
established in late 2010 with Agency for Healthcare 
Research and Quality (AHRQ) funds. Its goal was to 
provide a secure, scalable distributed infrastructure 
to facilitate comparative effectiveness research 3 
among widely dispersed institutions, and to provide 
flexibility to participant sites in the means for data- 
driven collaboration. SCANNER developed a com- 
prehensive service-oriented framework for mapping 
policy requirements into network software and 
data operations, 4 and demonstrated ability to 
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Table 1 System characteristics of pSCANNER's participating institutions 


Institution 


Number of patients 


Number of hospitals/clinics 


EHR used 


UMIVclblly Ul V_dlM(JI llld, Jdll UlcLjU 


Z.Z IIIMMUM 


A nncmtalc 
H IIUbpiLdlb 


bpiC 






142 clinics 




I lni\/orcit\/ nf C alifnrnip I nc Annoloc il \C\ 

UlllVCloliy Ul V.C1IIIUI I MO, LUo Ml IUCICj yUV_LM/ 


A 1 millinn 

H. I II IIIIIUI I 


3 hncnitalc 

J IIUoUILalO 


pic 






300 clinics 




I Iniworcitw nf C alifnrnia Can Franricm (I lf~CC\ 
UMIvclblly Ul V_dMIUI Mid, jdll rldllLlbLU ^UV_jry 


3 millinn 
D IIIMMUM 


1 hncnital 
I IIUbpiLdl 


pic 






463 clinics 




Can Franri<;rn fnPnpral HnQnital f^Ffil-n 
jail rioiiv_iov_u vjciiciai nuopiioi yjrvjn/ 


f) R millinn 

U.J II IMMUI 1 


1 hnQnital 

1 IIUopiLOl 


I ifptimp Cliniral Rprnrrl^ 

LIICUIIIC V_MIIIV_al [ACLUIUi 






28 clinics 




I lni\/orcit\/ nf C alifnrnia Hpx/ic il 1 
UlllvclolLy Ul \~a 1 1 1 Ul 1 1 Id, L/avIo \\J\~\J ) 


7 9 millinn 

L.i. II IMMUI 1 


1 hncnital 

I IIUbpiLdl 


pic 






77 clinics 




University of California, Irvine (UCI) 


1.4 million 


1 hospital 


Allscripts — Sunrise 






184 clinics 




Veterans Affairs (VA) 


8.7 million 


151 hospitals 


VistA 






909 ambulatory care and community-based 








outpatient clinics 




AltaMed 


0.3 million 


31 clinics 


NextGen 


QueensCare Family Clinics 


19 000 


6 clinics 


Sage 


The Children's Clinic (TCC) 


24 000 


5 clinics 


Epic 


Each institution is listed with its respective number of patients, number of hospitals/clinics, and the electronic health record (EHR) system. 
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interoperate with institutions that were not in its initial member 
list. SCANNER services were adopted in interventional compara- 
tive effectiveness trials led by the University of Southern 
California (USC) team and funded by the National Institute on 
Aging (NIA). Four interventional studies (three randomized trials 
and one quasi-randomized trial) have been carried out on the 
network. Through these studies, SCANNER'S data model was 
further developed to capture important interventional variables 
and economic outcomes. Sites included three federally qualified 
health systems in the Los Angeles area: AltaMed, QueensCare 
Family Clinics, and The Children's Clinic of Long Beach. The 
SCANNER team has collaborated with other networks in the 
development and/or utilization of standards and tools (eg, data 
model development of OMOP V4.0 in collaboration with inves- 
tigators from SAFTINet 5 ). 



the patient co-chair of the steering committee and patient 
co-chairs of the advisory board, will be key decision-makers in 
the development of guidelines that will apply to all three condi- 
tions for the whole network. They will also help design the best 
approaches for recruiting and engaging patients at all levels of 
governance. One example of our engagement process is the use 
of a systematic approach to achieve consensus on identification 
and prioritization of research questions to be studied in the 
network. We will apply the RAND/UCLA modified Delphi 
Appropriateness Method, a deliberative and iterative approach 
to attaining consensus through discussion and feedback. 8 While 
in-person engagement may be desirable, it is often not prag- 
matic: events can be time- and cost-prohibitive for larger groups 
of stakeholders. Figure 1 depicts our planned method of 
engagement. To convene large (n=360) regional or national 



USE CASES AND PARTICIPATORY RESEARCH 

Since 2010, components of SCANNER technology have facili- 
tated scaling from the original AHRQ- and NIA-sponsored 
studies to an additional intervention study involving patient- 
reported data in an obesity cohort at QueensCare Family Clinics 
and an independently funded Center for Medicare & Medicaid 
Innovation (CMMI) study targeting patients with congestive 
heart failure and patients with high body mass index. The 
SCANNER team also completed engagement research that 
included patients (six focus groups, 6 California statewide survey 
of consumers, VA patient surveys, patient navigator preliminary 
surveys, as well as a national survey). Findings from these 
studies informed the approach to patient engagement in govern- 
ance and in prioritization of research questions that will be used 
in pSCANNER. Initial use cases in pSCANNER will focus on 
three conditions: (1) congestive heart failure; (2) obesity; (3) 
Kawasaki disease. Kawasaki disease, a rare disease, is an acute 
vasculitis that causes heart disease in children and young 
adults. 7 Patients/caregivers, clinicians, researchers, and adminis- 
trators who represent these conditions will be recruited from 
participating clinical sites and advocacy and patient organiza- 
tions to participate at multiple levels from stakeholder input, to 
advisory board, to national committee. Patient leaders, such as 




Stakeholder Groups ExpertLens Process 

Figure 1 Stakeholder engagement. Stakeholders from across 
patient-centered Scalable National Network for Effectiveness Research 
(pSCANNER) sites (360 total) will be recruited to participate in a 
three-round ExpertLens process, which will prioritize research questions 
that should be addressed by pSCANNER. In round 1, participants will 
rate different research priorities and research questions. In round 2, 
medians and quartiles of group responses to each question will be 
presented to the participants. In round 3, participants will be asked to 
modify their round 1 responses based on round 2 feedback and 
discussion. 
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samples of stakeholders representing patients, clinicians, and 
researchers across the three different health conditions, we will 
apply an online Delphi process. Deliberative research has shown 
that offering data on opinions from subgroups of discussants as 
well as the overall group 9 can facilitate more effective conclu- 
sions. We will use RAND's online Delphi consensus management 
system, called ExpertLens, 10 which allows stakeholders with dif- 
ferent expertise to weigh in on all questions in order to float 
issues that might not be considered by a less diverse group. The 
system has been used in mixed stakeholder groups, including 
patients and clinicians, and found to facilitate rapid consensus 
panels for selecting research questions with large groups of stake- 
holders with much greater efficiency than the traditional Delphi 
process. 11 Using this approach, pSCANNER will have the ability 
to capture consensus input from a large group of participating 
stakeholders in a systematic manner to drive research priorities. 

STANDARDS FOR REPRESENTING DATA ELEMENTS AND 
DATA PROCESSING 

pSCANNER will adhere to recognized terminologies, including 
meaningful use or Centers for Medicare & Medicaid Services 
(CMS) billing terminology standards for diagnosis codes (The 
International Classification of Diseases, Ninth Revision, Clinical 
Modification (ICD-9-CM), ICD-10-CM or SNOMED Clinical 
Terms (SNOMED CT)), procedures and test orders (Current 
Procedural Terminology (CPT) or Healthcare Common 
Procedure Coding System (HCPCS)), and medications (National 
Drug Code (NDC) or RxNorm). In some cases, the native 
encoding for laboratory information is Logical Observation 
Identifiers Names and Codes (LOINC); in other cases, these 
mappings are translated and maintained as part of research data 
warehouses. 

Federally incentivized standards for terminologies and struc- 
tures have been established for communicating single records; 
indeed, patients themselves can request these digital 'continuity 
of care documents' and relay them to Clinical Data Research 
Networks (CDRNs) or Patient-Powered Research Networks 
(PPRNs). However, multisite PCOR analysis requires 



communicating rules for processing raw population-level data 
into prepared analytic datasets. There are emerging standards 
from Health Level Seven (HL7), the Health Quality Measures 
Format (HQMF), and Quality Reporting Document 
Architecture (QRDA) specifications for datasets and abstract 
data processing rules for electronic quality measures. These stan- 
dards have not yet become fully integrated into federally incen- 
tivized data policy. In SCANNER, we adopted a syntax that is 
compatible with the HQMF and QDRA because these standards 
have some preliminary endorsement by the National Quality 
Forum, CMS, and Office of the National Coordinator as a 
means of specifying population-level datasets for electronic 
quality measures. pSCANNER datasets will be specified in this 
syntax, which can then be interpreted by an adapter to generate 
executable queries (SCANNER implemented an adapter for 
OMOP V4.0 12 ). All institutions in pSCANNER have agreed to 
standardize their data model to OMOP and to install a 
pSCANNER node to allow distributed computing, which 
greatly enhances distributed count query capabilities into multi- 
variate analytics. 

Figure 2 illustrates how pSCANNER will operate. The stand- 
ard operating procedures for data harmonization will include 
well-defined steps for data modeling and quality control using 
tools that have been developed and continue to be developed by 
the OMOP data management collaborative. 13-16 All steps will 
be published in a standard format to ensure that the data within 
the network will adhere to standard operating procedures, and 
that they can be easily shared with other members of PCORnet 
that adopt OMOP or similar models. Members of pSCANNER 
are active participants in the Data Quality Assessment collabora- 
tive (http://repository.academyhealth.org/dqc/), and have devel- 
oped additional tools for quality auditing and assessing validity 
and fitness for use in both research and other secondary uses of 
population-level data. 17-19 

PRIVACY, POLICY, AND TECHNOLOGY 

pSCANNER addresses institutional policies and patient prefer- 
ences for data sharing by leveraging recent privacy policy study 



SFGH 



UCSF 



UCD 




Legend: 



Figure 2 Patient-centered Scalable National Network for Effectiveness Research (pSCANNER) architecture. pSCANNER is a clinical data research 
network that will integrate over 21 million patients. It will use privacy and security tools to enable distributed analysis of data while keeping data in 
their host institutions and adhering to all applicable federal, state, and institutional policies, k, thousand; m, million; OMOP, Observational Medical 
Outcomes Partnership; QueensCare, QueensCare Family Clinics; SFGH, San Francisco General Hospital; TCC, The Children's Clinic of Long Beach; 
UCD, University of California, Davis; UCI, University of California, Irvine; UCLA, University of California, Los Angeles; UCSD, University of California 
San Diego; UCSF, University of California, San Francisco; VA, Veterans Affair; VM, Virtual Machine. 
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findings in its technology design and implementation. 
SCANNER'S original partners consisted of institutions with 
highly diverse policies related to the use of EHRs for research. 
We produced a comprehensive comparison of the legal require- 
ments and differences among federal and state regulations for 
states involved in SCANNER. 13 We carefully documented health 
system privacy requirements obtained from institutional docu- 
ments as well as interviews with system leaders. Additionally, we 
interviewed patients and clinicians to understand their prefer- 
ences towards individual privacy. 6 20 The SCANNER team 
described the security and privacy standards used in CDRNs. 21 
We also conducted a systematic review of privacy technology 
used in CDRNs. 3 We codified data-sharing policies 13 — each insti- 
tution specified the policies to which it should adhere. For 
example, institutions that could share patient-level data were 
welcome to do so, while institutions that had to keep their 
patient-level data within their firewalls could share aggregate data 
such as coefficient estimates, which are equally useful in building 
multivariate models that span all institutions. 22 23 As other insti- 
tutions joined the network, only new policies were encoded. 

We have architected pSCANNER so that authorized users can 
use a privacy-preserving distributed computation model and 
research portal that was successfully piloted in SCANNER. This 
portal includes a study protocol and policy registry — for each 
study, sites approve specific analytic tools, datasets, and proto- 
cols for data privacy and security. Role-based access controls 
(RBACs) corresponding to federal, state, institutional, and study- 
specific policies are encoded in the registry and enforced 
through SCANNER data access services. SCANNER'S main goal 
was to develop a set of highly configurable, computable policies 
to control data access and to develop effective methods to 
perform multisite analyses, without necessarily transferring 
patient-level data. 3 22 24 25 Traditional approaches to multisite 
research involve transferring patient-level data to be pooled for 
inferential statistics (eg, multivariate regression). While this can 
be supported, we conduct comparative effectiveness research 
with increased efficiency because SCANNER allows distributed 
regressions, thereby avoiding more complex IRB and data use 
agreements. SCANNER'S analytic library was seeded with ana- 
lysis tools used and validated in multiple publications, 26-29 and 
incorporated into the Observational Cohort Event Analysis and 
Notification System (OCEANS) (http://idash.ucsd.edu/dbp- 
tools#overlay-context=idash-software-tools) and Grid Binary 
Logistic Regression (GLORE), 22 which include multivariate ana- 
lysis methods that allow model fitting, causal inference, and 
hypothesis testing. When a study policy is created, the site prin- 
cipal investigator specifies the allowable analytic methods and 
transfer protocols. In particular, each study's site principal inves- 
tigator will determine if results must be held locally and 
approved by a delegated representative prior to release for trans- 
fer. Our distributed system allows the construction and 



evaluation of multivariate models that can be used for statistical 
process control, data safety monitoring in clinical trials, adjust- 
ment for confounders, propensity score matching, risk predic- 
tion, and other methods used in PCOR (figure 3). 

SECURE ACCESS 

The pSCANNER network partners have security policies 
already in place as follows. 

The SCANNER Central node is located within a secure Health 
Insurance Portability and Accountability Act (HIPAA) -compliant 
environment and will migrate to the platform developed for the 
NIH-funded integrating data for analysis, anonymization, and 
sharing center (iDASH), which is now being modified to be 
Federal Information Security Management Act (FISMA) certi- 
fied. For studies that require pooling of data, the hub will store 
the data in iDASH and will use methods that were developed to 
protect privacy of individuals 3 30 31 and institutions from which 
the data originate. 22 23 25 32 Future work in the pSCANNER 
project will extend these capabilities to include infrastructure 
necessary for managing randomized clinical trials, including ran- 
domization, recruitment, and enrollment tracking systems. These 
will require modifications to the current RBAC model. Access to 
this environment is provided through a virtual private network, 
and we are implementing a two-factor authentication based on 
the RSA technology, which issues new keys every minute. These 
one-time keys are displayed on key fobs or through free Apps as 
soft tokens on smartphones. All protected health information 
rests behind the firewall of each institution. Requests are received 
by SCANNER software outside the firewall and transmitted to 
the Virtual Machine (VM) located inside the firewall according 
to predefined authorization rules. The SCANNER network soft- 
ware is compliant with National Institute of Standards and 
Technology RBAC, and applies best practices for RESTful web 
services to ensure that both data and role-based policy settings 
are not vulnerable to attack by adversaries. 33 34 All communica- 
tions are fully encrypted end-to-end, and all nodes participating 
in the network authenticate each other through X.509 certificate 
exchanges. 

UC-ReX uses authentication through the UC campus active 
directory of the requester. Users have to login through a virtual 
private network when they are not on campus. Since the 
network currently only provides results of count queries, all 
results are provided automatically, with the addition of some 
noise in the counts 35 36 to prevent users from uniquely identify- 
ing a specific patient through a series of queries. 

VINCI and the VA are transitioning to a two-factor authenti- 
cation system to authorize users. Currently, centrally managed 
username and password are required. Users can optionally use 
personal identification verification cards with passwords to 
access VA systems. Soon, personal identification verification 
cards will be required VA wide to access the VA network. Within 



Figure 3 Distributed computing. 
Patient-centered Scalable National 
Network for Effectiveness Research 
(pSCANNER) answers an end-user's 
scientific question by distributing the 
corresponding query to each 
participating site, processing the query 
locally while preserving each site's 
stringent data privacy and security 
requirements, then aggregating the 
responses into a coherent answer. 



End-user > Central Site > Participating Site > Central Site > End-user 





Science Question > Distributed Query > Local Process > Aggregate > Science Answer 
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a protected enclave of the secure VA network, VINCI hosts inte- 
grated national data. Access to data is provisioned through an 
electronic system that matches IRB protocol to datasets. VINCI 
has 105 high-performance servers and 1.5 petabytes of high- 
speed data storage with multiple layers of security. The remote 
computing environment enables data analysis to be performed 
directly on VINCI servers. Unless explicitly requested and insti- 
tutionally approved, all sensitive patient data must remain on 
VINCI project servers. VINCI staff audit for appropriateness all 
data transfers out of the VINCI enclave. 

SUMMARY 

Our network of three networks representing multiple health 
systems and diverse populations embodies the challenges and 
opportunities that PCORnet itself has to face. While we have 
based the design of our system on qualitative research and stake- 
holder engagement, in practice, adoption and success will 
depend on many factors. For example, pSCANNER will encode 
a significant portion of policies in software, use a flexible strat- 
egy to harmonize data, and use privacy-preserving technology 
that enables highly diverse institutions to join the network and 
allow stakeholders to participate. Significant challenges in terms 
of providing sufficient incentives for patients, clinicians, and 
health systems to participate and ensuring the sustainability of 
the network, which were not the focus of this article, will also 
need to be addressed. The pSCANNER project offers a unique 
opportunity to make progress toward these objectives, and share 
results with a community of researchers and representatives 
from a broader group of stakeholders. It represents a unique 
opportunity to reaffirm our goals: our health systems have 
public service as their primary mission, from both a healthcare 
and an educational perspective. pSCANNER is itself a reflection 
of this mission, teamwork, and focus on patient outcome 
research to improve health. 
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